Add opt-in OTel GenAI metrics (0067)#177
Merged
Merged
Conversation
The OTel observer can now emit the metrics signal alongside its spans: two histogram instruments over provider calls, gated by a new enable_metrics flag (default off, independent of span emission). One records an LLM completion's input and output token counts; the other records the call duration, once per attempt under call-level retry and including a failed attempt (which carries error.type). Both draw from the per-attempt LlmRetryAttemptEvent, the LLM-span source, so metrics record even with spans disabled. The Meter comes from the configured MeterProvider (injectable; falls back to the OTel global no-op when none is set). Implements proposal 0067 (observability metrics), LLM path.
Advance the spec pin v0.67.0 -> v0.68.0 across the four sync points (submodule, __spec_version__, pyproject, conformance manifest) and the smoke assertion; regenerate the bundled AGENTS.md. Wire conformance fixtures 088 / 090 / 091 through a new metrics driver that captures observations via a private MeterProvider plus an in-memory MetricReader (the conformance-adapter metric-capture primitive); the embedding fixture 089 is deferred until the embedding capability lands. Teach the fixture-parser schema the new shapes (expected.metrics and the calls_embed node directive). Record proposal 0067 partial, document the enable_metrics flag, and add the CHANGELOG entry.
There was a problem hiding this comment.
Pull request overview
Adds opt-in OpenTelemetry metrics emission to the bundled OTelObserver per accepted spec proposal 0067 (spec v0.68.0), alongside the usual span emission, and updates the spec pin + conformance harness to validate the new fixtures.
Changes:
- Add
enable_metrics+meter_providertoOTelObserver, creating/recording two OA-namespaced GenAI histogram instruments fromLlmRetryAttemptEvent. - Extend conformance + unit tests to capture/validate emitted metrics via a private
MeterProvider+InMemoryMetricReader, and add fixture-schema support (expected.metrics,calls_embedfor deferred 089). - Bump pinned spec version from 0.67.0 → 0.68.0 across runtime, pyproject, conformance manifest, docs, and changelog.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/observability/otel/observer.py |
Implements opt-in metrics instruments + per-attempt recording for duration/token usage. |
tests/unit/test_observability_otel.py |
Adds unit tests asserting metrics emission, disabling behavior, span-independence, and retry attempt counting. |
tests/conformance/test_observability.py |
Wires new metrics fixtures (088/090/091) and adds a metrics fixture driver/capture/assertion helpers. |
tests/conformance/harness/expectations.py |
Extends observability expected schema with metrics. |
tests/conformance/harness/directives.py |
Adds calls_embed directive shape so deferred embedding fixture 089 parses/round-trips. |
docs/concepts/observability.md |
Documents enable_metrics, instruments, dimensions, and meter-provider behavior. |
tests/test_smoke.py |
Updates spec-version assertion to 0.68.0. |
pyproject.toml |
Updates [tool.openarmature].spec_version to 0.68.0. |
src/openarmature/__init__.py |
Updates __spec_version__ to 0.68.0. |
src/openarmature/AGENTS.md |
Updates bundled agent-doc header to spec v0.68.0. |
conformance.toml |
Advances spec_pin and records proposal 0067 as partial with rationale. |
CHANGELOG.md |
Adds release note entry for OTel GenAI metrics and updates spec-pin summary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements accepted proposal 0067 (spec v0.68.0): adds the OpenTelemetry metrics signal to the bundled OTel observer, opt in with
enable_metrics. Pin advances v0.67.0 to v0.68.0 (0067 is the only proposal in the delta).What changed
Two OA-namespaced histogram instruments over provider calls, recorded only when
enable_metrics=True(default off):openarmature.gen_ai.client.token.usage({token}): per LLM completion, two observations, the input and output token counts (taggedopenarmature.gen_ai.token.type), from the response usage record. Recorded only when the call returned usage.openarmature.gen_ai.client.operation.duration(s): the provider-call wall-clock duration, once per attempt under call-level retry, including a failed attempt (which carrieserror.type).Both carry
openarmature.gen_ai.operation("chat"),gen_ai.request.model, andgen_ai.system, with the spec's explicit bucket advisories. TheMetercomes from the configuredMeterProvider(injectable viameter_provider=...; the OTel global no-op fallback when none is set). Metrics are independent of span emission: they record even withdisable_llm_spans=True. Metrics target OTel only (no Langfuse mapping). The instrument names are OA-namespaced, mirroring the upstreamgen_ai.client.*instruments (still at Development status), so a future cutover is a mechanical prefix-strip.Implementation note
The proposal sources metrics from the typed completion/failure events and requires duration "once per attempt". In this implementation the per-attempt event is the internal
LlmRetryAttemptEvent(the LLM-span source since 0050), which already carries latency, usage, error category, model, and provider, so metrics record from it: one duration sample per attempt, token usage only when usage is present. The terminal events are not used (they would double-count). This is the same internal-event latitude the spec blessed for the 0050 per-attempt span surface.Embedding metrics deferred
The proposal's embedding-call metrics (fixture 089) are deferred: the embedding capability (proposal 0059) is unimplemented in python until a later release, so there is no embedding event or provider to record from.
conformance.tomlrecords 0067partialon that basis. The LLM-call fixtures (088 / 090 / 091) are implemented and wired through a privateMeterProviderplus an in-memoryMetricReader(the conformance-adapter metric-capture primitive); 089 rides the deferred set.Tests
error.typeon failure, the disabled no-op, span-independence, and once-per-attempt-under-retry (asserting on histogram counts, since identical-dimension observations aggregate).expected.metricsfield, its discriminator key, and acalls_embednode directive so 089 still round-trips.mkdocs build --strictclean.